Condition No. | Condition Name | Donor 1 | Donor 2 | Donor 3 | doublet | unassigned |
|---|---|---|---|---|---|---|
1 | Ex vivo | 7,029 | 8,416 | 8,453 | 3,925 | 129 |
2 | Early T-cells unexposed | 7,986 | 8,737 | 9,396 | 7,150 | 912 |
3 | Early T-cells exposed | 6,746 | 3,029 | 5,600 | 2,939 | 365 |
4 | Late T-cells unexposed | 2,249 | 3,688 | 1,645 | 2,384 | 34 |
5 | Late T-cells exposed | 1,359 | 3,095 | 5,095 | 1,095 | 251 |
T-cell scRNAseq Report
Alistair Bailey
2023-07-20
Introduction
This is an informal report of the results of scRNAseq of T-cells from three donors under five conditions, two conditions involving neutrophil exposure.
The samples were prepared as the three donors multiplexed in one run per experimental condition for gene expression (RNA) and antibody capture with TotalSeq B cocktail (“TotalSeq-b Human Universal Cocktail, V1.0,” n.d.) targeting 134 cell surface antigens (protein).
The analysis strategy I settled upon was to perform trajectory inference as described by Roux de Bezieux et al. (Bézieux et al., n.d.). The motivation being that we aim to capture a continuum of cell states present in the data both within and across each experimental condition. A trajectory is a dynamic process represented as directed graph where a lineage is a distinct path along the graph such that a single trajectory can split into multiple lineage. Pseudo-time represents the distance along a path (lineage).
This aims to address three overarching questions:
- Are dynamic differences between experimental conditions for each donor best characterised by a single or multiple trajectories?
- Are there differences in the progression of the cell states in each condition along and between the lineage(s)?
- Are there differences in the gene expression associated with each condition and lineage?
The overall aim is to identify changes in T-cell states following neutrophil exposure in terms of differential progression and gene expression.
Sequencing aimed to capture 10,000 cells per donor per condition. Experimental conditions were:
- Ex vivo T-cells.
- Early unexposed T-cells. These are T-cells treated with activation cocktail for a short time, but not exposed to neutrophils.
- Early exposed T-cells. These are T-cells treated with activation cocktail for a short time, and exposed to neutrophils.
- Late unexposed T-cells. These are T-cells treated with activation cocktail for a longer duration, but not exposed to neutrophils.
- Late exposed T-cells. These are T-cells treated with activation cocktail for a longer duration, and exposed to neutrophils.
Results summary
Here all the conditions are analysed together per donor, but the below results suggest focusing on conditions 1-3 for donors 1 and 3, and possibly analysing their CD4+ and CD8+ T-cells separately.
The outcome of the trajectory and differential expression analysis suggests that each experiment is sufficiently different from each other to fit individual trajectories, but that they follow a common skeleton. There are differences in the progression and gene expression along their lineages for donors 1 and 3. Lineage 1 contains the most “weight” and is the longest, so I focus on that for differential gene expression for Lineage 1 in donors 1 and 3.
At a false discovery rate threshold of 5%, donor 1 and donor 3 have 1248 and 875 DE genes respectively in Lineage 1 across all five experimental conditions. 721 of these genes are shared between these two donors. The table at the end is searchable if you want to browse through for genes of interest.
To understand what is driven by the activation reagents and what can be assigned to neutrophil exposure will take a bit more work and hence my conclusion that it may be more fruitful to focus on donors 1 and 3 and only conditions 1 to 3.
Analysis workflow
The analysis workflow has two parts:
Preparation for trajectory analysis
Part 1 has the following steps:
- Run 10X cellranger multi to align sequences for gene expression to GRC38 reference genome and identify corresponding TotalSeq antibody capture molecular identifiers for the 5 runs.
- Demultiplex the 5 sequencing runs to assign the cells to their respective donors using 1000 Genomes GRCh38 SNP reference, Demuxafy (Neavin et al., n.d.) and Vireo (Huang, McCarthy, and Stegle 2019). This step also identifies doublets for removal. We are only interested in singlets.
- Quality control each of the 5 runs for dead or dying cells, lysed cells, or empty cells using Seurat(Hao et al. 2021b) and remove doublets identified by Vireo.
- Split and recombine the five QC’d runs to create one object per donor, each object containing the five datasets for the five experimental conditions.
- For each donor dataset annotate the cell types using Celldex curated cell reference(Aran et al. 2019).
- Use both the gene expression and TotalSeq surface protein counts to cluster the cells using a Weighted Nearest Neighbours Analysis for each dataset using Seurat (Hao et al. 2021a). This normalises the data and reduces the dimensionality.
- Integrate the five datasets into one dataset per donor.
- Identify the T-cell clusters and subset the data to remove any other cells e.g. B-cells, and re-cluster the data to create one integrated T-cell only dataset per donor containing the five experimental conditions.
Trajectory analysis
Part 2 has the following steps using the workflow created by Roux de Bézieux et. al (Bézieux et al., n.d.).
- Identify lineage structure within integrated dataset for each donor.
- Test whether to fit a single trajectory for the five conditions or fit one trajectory per condition. Always one trajectory per condition.
- Fit trajectories.
- Test for differential progression along lineages and between conditions.
- Test for differential expression along lineages and between conditions.
Results
Cell numbers
The main thing to note in the tables below is the variation in cell numbers across the different donors and conditions. Notably:
- The cell numbers recovered increase upon initial treatment with the T-cell activation reagent (compare conditions 1 & 2, Table 1 and Table 2).
- But something has happened with condition 4 that has led to much lower numbers of cells recovered. This makes comparison with the other conditions difficult.
- Neutrophil exposure reduces the cell numbers, presumably killing them.
Table 1 shows the numbers of cells recovered following demultiplexing:
Table 2 shows the numbers of cells recovered following the quality control steps 1 to 3 of the workflow:
Condition No. | Condition Name | Donor 1 | Donor 2 | Donor 3 |
|---|---|---|---|---|
1 | Ex vivo | 6,194 | 7,490 | 7,773 |
2 | Early T-cells unexposed | 7,754 | 8,263 | 8,848 |
3 | Early T-cells exposed | 5,778 | 2,563 | 5,065 |
4 | Late T-cells unexposed | 2,153 | 3,472 | 1,570 |
5 | Late T-cells exposed | 1,215 | 2,506 | 4,183 |
Table 3 shows the numbers of cells remaining once the data has been filtered for only the T-cells and the T-cell types:
Condition No. | Condition Name | label.main | Donor Donor 1 | Donor Donor 2 | Donor Donor 3 |
|---|---|---|---|---|---|
1 | Ex vivo | CD4+ T-cells | 4,153 | 5,074 | 4,023 |
1 | Ex vivo | CD8+ T-cells | 2,041 | 2,416 | 3,750 |
2 | Early T-cells unexposed | CD4+ T-cells | 5,378 | 4,096 | 5,731 |
2 | Early T-cells unexposed | CD8+ T-cells | 2,376 | 4,167 | 3,117 |
3 | Early T-cells exposed | CD4+ T-cells | 3,601 | 1,723 | 2,739 |
3 | Early T-cells exposed | CD8+ T-cells | 2,177 | 840 | 2,326 |
4 | Late T-cells unexposed | CD4+ T-cells | 1,508 | 1,870 | 1,070 |
4 | Late T-cells unexposed | CD8+ T-cells | 645 | 1,602 | 500 |
5 | Late T-cells exposed | CD4+ T-cells | 652 | 1,542 | 2,733 |
5 | Late T-cells exposed | CD8+ T-cells | 563 | 964 | 1,450 |
Creation of donor integrated T-cell datasets
As described above each dataset was quality controlled to remove dead cells and doublets, then clustered using information from both the gene expression and antibody tags, and annotated for cell type using reference annotation data. The five conditions for each donor were then integrated into a single dataset per donor, filtered for just CD4+ and CD8+ T-cells and re-clustered ready for further analysis.
Figure 1, Figure 2 and Figure 3 show the final clustering for each donor and each respective experimental condition. Each dot is a cell and the numbers/density correspond with Table 3.
Trajectory inference across multiple conditions with condiments: differential topology, progression, differentiation, and expression
To re-cap from the introduction, a trajectory is a dynamic process represented as directed graph where a lineage is a distinct path along the graph such that a single trajectory can split into multiple lineage. Pseudo-time represents the distance along a path (lineage).
This aims to address three overarching questions:
- Are dynamic differences between experimental conditions for each donor best characterised by a single or multiple trajectories?
- Are there differences in the progression of the cell states in each condition along and between the lineage(s)?
- Are there differences in the gene expression associated with each condition and lineage?
The overall aim is to identify changes in T-cell states following neutrophil exposure in terms of differential progression and gene expression.
For brevity I’m just showing the fitted trajectories, one per experimental condition so you can see the similarities/differences in Figure 4, Figure 5, Figure 6.
Differential progression
It’s quite tricky to visualise the differences in lineages, but I’m showing two ways: density plots (Figure 7,Figure 8, Figure 9) and weight plots (Figure 10, Figure 11). Note as there is only one lineage for donor 2, all the weight is in that lineage by definition. This way you can see the general variation in the cell states in pseudotime between conditions and lineages, and that lineage 1 is the longest in the two donors (1 and 3) with multiple lineages. And that lineage 1 carries the most weight and is therefore the focus of differential gene expression analysis in the next step.
Differential gene expression
The heat-maps here should be read left to right to correspond with moving along the lineage in pseudotime to show how the genes filtered at 5% FDR in Lineage one change their expression values in each condition.
Finally here is a table of the 721 shared DE genes in Lineage 1 filtered at 5% FDR between donors 1 and 3.